In recent years there has been a growing interest in developing "streamingalgorithms" for efficient processing and querying of continuous data streams.These algorithms seek to provide accurate results while minimizing the requiredstorage and the processing time, at the price of a small inaccuracy in theiroutput. A fundamental query of interest is the intersection size of two bigdata streams. This problem arises in many different application areas, such asnetwork monitoring, database systems, data integration and informationretrieval. In this paper we develop a new algorithm for this problem, based onthe Maximum Likelihood (ML) method. We show that this algorithm outperforms allknown schemes and that it asymptotically achieves the optimal variance.
展开▼